Pattern detection

ABSTRACT

Apparatus for detecting a pattern in a data stream comprises a pattern matching device for receiving the data stream. The pattern matching device comprises one or more rule engines, each rule engine operating under a plurality of state transition rules encoding a plurality of patterns, a first state transition rule including a wildcard state component and a wildcard input component, a second state transition rule including a wildcard state component and a specified input component, and a third state transition rule including a specified state component and a specified input component, the first, second and third rules having differing priorities, and at least one state transition rule including an output component indicating a pattern match. The apparatus is arranged to pass the data stream to each rule engine, and is further arranged to output a signal indicating a pattern match when a state transition rule indicates a pattern match.

FIELD OF THE INVENTION

This invention relates to an apparatus and to a method for detecting apattern in a data stream.

BACKGROUND OF THE INVENTION

The detection of a particular pattern in a data stream is used in manycomputing environments. For example, in fields such as virus detection,the data stream that is being received by a computer will need to bemonitored for the presence of viruses. The virus checker will be able torecognise specific viruses and also viruses of generic types. The viruschecker will have access to a data structure that includes a largenumber of different patterns, probably over a thousand in number. Thepatterns can yyyy//yy// . . . prise simple character sequences (strings)such as “password” or can be specified in a more flexible way, forexample, using regular expressions that can include generic referencesto character classes and the number of occurrences of certain characterand character sequences.

A data stream that is received by a computer, which needs to beanalysed, will be formed of a series of bytes and in a common protocolssuch as TCP/IP (used for Internet communication) these bytes will bereceived in the form of data packets. These data packets that form thedata stream are scanned for the presence of the stored patterns as thestream is received. This scanning can be executed by software, or insome environments a dedicated ASIC of an FPGA can be used to carry outthe pattern matching. If a pattern is detected, then an output signal isgenerated, and depending upon the application, then action such asdeleting the pattern from the data packet is executed.

All known pattern matching systems have one or more weaknesses. Theseinclude a large storage requirement for the data structure, the highconsumption of processing resources, the difficulty of the patternmatching working in real time on streamed data, and the difficulty inupdating the data structure storing the patterns, when new patterns fornew viruses are to be added to the data structure.

In A. V. Aho and M. J. Corasick, “Efficient string matching: An aid tobibliographic search,” Communications of the ACM, vol. 18, no. 6, pp.333-340, 1975, is described an algorithm for performing pattern-matchingby constructing a conventional state transition diagram. The algorithmconsists of constructing a finite state pattern matching machine fromthe keywords and the using the machine to process the text string in asingle pass. The approach combines the ideas of the Knuth-Morris-Prattalgorithm with those of finite state machines. The storage efficiency,pattern-matching performance, and update performance of this method arehowever rather limited.

SUMMARY OF THE INVENTION

It is therefore an aspect of the invention, to improve upon the knownart. According to a first aspect of the invention, there is providedapparatus for detecting a pattern in a data stream comprising a patternmatching device for receiving the data stream. The pattern matchingdevice comprising one or more rule engines with each rule engineoperating under a plurality of state transition rules encoding aplurality of patterns, a first state transition rule including awildcard state component and a wildcard input component, a second statetransition rule including a wildcard state component and a specifiedinput component, and a third state transition rule including a specifiedstate component and a specified input component, the first, second andthird rules having differing priorities, and at least one statetransition rule including an output component indicating a patternmatch, the apparatus arranged to pass the data stream to said at leastone rule engine, and further arranged to output a signal indicating apattern match when a state transition rule indicates a pattern match.

According to a second aspect of the invention, there is provided amethod for detecting a pattern in a data stream comprising receiving thedata stream, running one or more rule engines with each rule engineoperating under a plurality of state transition rules encoding aplurality of patterns, a first state transition rule including awildcard state component and a wildcard input component, a second statetransition rule including a wildcard state component and a specifiedinput component, and a third state transition rule including a specifiedstate component and a specified input component, the first, second andthird rules having differing priorities, and at least one statetransition rule including an output component indicating a patternmatch, passing the data stream to said at least one rule engine, andoutputting a signal indicating a pattern match when a state transitionrule indicates a pattern match.

According to a third aspect of the invention, there is provided acomputer program product on a computer readable medium for controllingapparatus for detecting a pattern in a data stream, the computer programproduct comprising instructions for receiving the data stream, runningone or more rule engines, said at least one rule engine operating undera plurality of state transition rules encoding a plurality of patterns,a first state transition rule including a wildcard state component and awildcard input component, a second state transition rule including awildcard state component and a specified input component, and a thirdstate transition rule including a specified state component and aspecified input component, the first, second and third rules havingdiffering priorities, and at least one state transition rule includingan output component indicating a pattern match, passing the data streamto said at least one rule engine, and outputting a signal indicating apattern match when a state transition rule indicates a pattern match.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects and features of the present invention willbecome clear from the following description by way of example only,taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram of apparatus for detecting a pattern in adata stream,

FIG. 2 is a state transition diagram,

FIG. 3 is a schematic diagram of a rule engine of the apparatus of FIG.1,

FIG. 4 is schematic diagram of a state transition rule,

FIG. 5 is a second state transition diagram,

FIG. 6 is a schematic diagram of a further portion of the apparatus ofFIG. 1,

FIG. 7 is a schematic diagram of an enhanced rule engine of theapparatus of FIG. 1,

FIG. 8 is schematic diagram of a second state transition rule,

FIG. 9 is a third state transition diagram,

FIG. 10 is a flowchart of a pattern distribution algorithm, and

FIG. 11 is a flowchart of an algorithm for converting a patterncollection into a series of state transition rules.

DESCRIPTION OF THE INVENTION

The present invention provides methods, apparatus and systems fordetecting a pattern in a data stream. An example apparatus comprises apattern matching device for receiving the data stream. The patternmatching device comprising one or more rule engines, said at least onerule engine operating under a plurality of state transition rulesencoding a plurality of patterns, a first state transition ruleincluding a wildcard state component and a wildcard input component, asecond state transition rule including a wildcard state component and aspecified input component, and a third state transition rule including aspecified state component and a specified input component, the first,second and third rules having differing priorities, and at least onestate transition rule including an output component indicating a patternmatch, the apparatus arranged to pass the data stream to said at leastone rule engine, and further arranged to output a signal indicating apattern match when a state transition rule indicates a pattern match.

There is provided a method for detecting a pattern in a data streamcomprising receiving the data stream, running one or more rule engines,the one or more rule engines operating under a plurality of statetransition rules encoding a plurality of patterns, a first statetransition rule including a wildcard state component and a wildcardinput component, a second state transition rule including a wildcardstate component and a specified input component, and a third statetransition rule including a specified state component and a specifiedinput component, the first, second and third rules having differingpriorities, and at least one state transition rule including an outputcomponent indicating a pattern match, passing the data stream to the oneor more rule engines, and outputting a signal indicating a pattern matchwhen a state transition rule indicates a pattern match.

There is further provided a computer program product on a computerreadable medium for controlling apparatus for detecting a pattern in adata stream. A computer program product comprising instructions forreceiving the data stream, running one or more rule engines with eachrule engine operating under a plurality of state transition rulesencoding a plurality of patterns, a first state transition ruleincluding a wildcard state component and a wildcard input component, asecond state transition rule including a wildcard state component and aspecified input component, and a third state transition rule including aspecified state component and a specified input component, the first,second and third rules having differing priorities, and at least onestate transition rule including an output component indicating a patternmatch, passing the data stream to the one or more rule engines, andoutputting a signal indicating a pattern match when a state transitionrule indicates a pattern match.

Owing to the invention, it is possible to provide an improved patternmatching method. The use of rule engines, based upon state transitionrules that include priorities, to execute the pattern matching basedupon state transition rules facilitates a system that can operate atreal time on the data stream as it is received, and efficient use ofcomputational and memory resources is achieved. The data structurestoring the patterns can be updated in a simple and timely manner.

Advantageously, the apparatus further comprises a pattern distributiondevice arranged to receive the patterns, to distribute the patternsacross a plurality of pattern collections, and to convert each patterncollection into a plurality of state transition rules. The patterndistribution device executes an algorithm to split the patterns into aseries of pattern collections, equal to the number of rule engines.Preferably, the pattern distribution device is arranged to distributethe patterns substantially evenly across the plurality of patterncollections. By splitting the patterns evenly across the collections,the most efficient use of processing resources is achieved, as each ruleengine will be handling a similar number of patterns.

Ideally, the pattern distribution device is arranged, when distributingthe patterns across the plurality of pattern collections, to distributethe patterns according to commonality and conflict between patterns.Commonality between patterns could be, for example, a common prefixbetween patterns, and conflict between patterns could be, for example, asubstring of one pattern (not including the first letter) being a prefixof another pattern. By distributing the patterns across the collectionssuch that those patterns with commonality are in the same collectionsand those patterns with conflicts are in different collections, thenumber of state transition rules for each rule engine is reduced, in theideal case even minimized, with the consequential reduction of theconsumption of storage and processing resources.

Advantageously, the apparatus further comprises a results processor forreceiving output from the one or more rule engines, the resultsprocessor arranged to determine if a pattern match has occurred. Theresults processor can be used, in a simple setup to collate the outputsignals received from the rule engines, or in more complicatedarrangements, it can be used to determine if a pattern match hasoccurred. This might happen if the original pattern is relativelycomplicated, and it is not computationally efficient to have a singlerule engine determine the pattern match. Instead, multiple engines canbe used to detect different portions of the pattern, while the resultprocessor then will determine if the original pattern is matched, basedon the match results for pattern portions. The results processor is ableto check such things as transition rules specifying additionalconditions in their outputs, which may relate to the location ofpatterns in the data stream, as well as the order that multiple patternsshould be detected and the distance between the multiple patterns in thedata stream.

Ideally, at least one of the state transition rules includes a characterclass component. Character classes can define particular groups ofcharacters, for example, numerical and alphanumerical values. Bysupporting the use of character classes, complicated patterns can berelatively simply transformed into state transition rules for simpleprocessing by a rule engine.

Advantageously, the pattern matching device comprises a plurality ofrule engines. In almost all practical applications of the patternmatching device, multiple rule engines will be used that each, inparallel, process the inputted data stream. Since the original patternshave been split into pattern collections that place conflicting patternsapart, then the greater the number of rule engines, the greater thereduction in confliction between patterns in each pattern collection.The actual number of rule engines that are used by the pattern matchingdevice is a design choice, but suitable values for many applicationswould be 8 or 16 rule engines. The greater the number of engines used,the smaller the total memory demand will be. This is because thereduction in conflicts between patterns reduces the number of statetransition rules to encode those patterns, and consequently, the amountof memory to store the state transition rules.

Advantageously, the rule engines are arranged in one or more pairs ofrule engines, with the or each pair of rule engines processing alternateportions of the data stream, with the results processor being arrangedto combine the outputs of the or each pair of rule engines. Manydifferent arrangements of the rule engines are possible, in a variety ofparallel and serial combinations. These can be decided as design choicesto increase the speed of the pattern matching, depending upon theresources available.

The pattern matching apparatus has the following functionalcharacteristics. It supports multiple pattern types including characterstrings and regular expressions. It supports multiple pattern conditionsthat can be specified separately for each pattern: case sensitivity,location at which the pattern should be detected within the input stream(typically specified using offset/depth parameters). The patternmatching apparatus will detect all patterns in the input stream,including multiple occurrences and overlapping patterns. It is scalableto support for at least tens of thousands of patterns. There is no basiclimitation on maximum pattern length except for memory capacity. Itsupports rules involving multiple patterns with interdependentconditions, for example, the order in which the patterns involved in arule should be detected, and the distance between the locations in theinput stream at which the patterns should be detected. It supportsdynamic incremental updates (programmable by modifying memory contents).The apparatus is suitable for ASIC and FPGA implementation.

The performance characteristics of the apparatus include; on-the-fly(single pass) operation involving a deterministic processing rate of atleast one character per clock cycle, which can be increased to multiplecharacters per clock cycle through different types of parallelization.It is more storage-efficient through a novel compression technique, forexample: 1500 fixed match patterns extracted from a commercial intrusiondetection rule set, comprising a total of 25K characters will fit inapprox. 100 KB. The apparatus has a better update performance: a patternupdate (insert/delete) takes approx. 1 ms-2 ms using an update functionexecuted in software on a state-of-the-art processor. The apparatusprovides the capability to active rules within much less than 1 ms.

FIG. 1 shows schematically apparatus 10 for detecting a pattern in adata stream 12. The apparatus 10 could be an application specificintegrated circuit (ASIC) or could be a field programmable gate array(FPGA) or could be a general processor (such as an Intel Pentium) underthe control of software. The apparatus 10 has many applications,including such things as intrusion detection. In the world of computing,the detection and disabling of viruses and other malignant softwarecomponents is desired, in any system where data is being received fromthe outside world, via, for example, the Internet.

The apparatus 10 comprises a pattern matching device 14 for receivingthe data stream 12 and carrying out the pattern matching. The patternmatching device 14 comprises a plurality of rule engines 16 a, 16 b. Theoperation of the rule engines is described in more detail below. In FIG.1, the rule engines are shown as grouped into two functional components,a basic pattern matching group of rule engines 16 a and a regularexpression matching group of rule engines 16 b. The apparatus 10 alsoincludes a results processor 18 and a control device 20.

The data stream 12 received by the apparatus 10 comprises a series ofbytes, which may be a continuous stream or may be in the form of datapackets (as is common in Internet communication). The apparatus 10 scansthe data stream 12 for the existence of specific patterns. Each ruleengine 16 a, 16 b is operating under a plurality of state transitionrules, which encode a plurality of patterns. The apparatus 10 isarranged to pass the data stream 12 to each rule engine 16, and furtherarranged to output a signal indicating a pattern match when a statetransition rule indicates a pattern match.

In order to explain the relationship between the patterns and the statetransition rules, FIG. 2 shows a state transition diagram for detectionof the pattern “testing”. The state transition rules that encode thisdiagram are as follows: current new rule state input −> state outputPriority R1 * * −> S0 — 0 R2 * t −> S1 — 1 R3 S1 e −> S2 — 2 R4 S2 s −>S3 — 2 R5 S3 t −> S4 — 2 R6 S4 i −> S5 — 2 R7 S5 n −> S6 — 2 R8 S6 g −>S0 1 2 R9 S4 e −> S2 — 2

The rules are generated automatically by an algorithm; this is discussedin more detail below, with reference to FIG. 11. Each rule governs theoperation of the rule engine by moving from a first state to a secondstate according to the input, with a possible output being triggered bychange in state. The wildcard character * in rules one and two refers toany state or input. The first state transition rule R1 includes awildcard state component and a wildcard input component, the secondstate transition rule R2 includes a wildcard state component and aspecified input component, and the third state transition rule R3includes a specified state component and a specified input component.The first, second and third rules have differing priorities.

Because of the wildcards, it is possible that multiple rules can matchfor a given state and input. In order to resolve that situation, thestate transition rules are assigned a priority. When deciding on thechange of state, the rule engine will act on the rule with the highestpriority, in case of multiple matching rules. Rule R8 includes an outputcomponent indicating a pattern match, which is the numeral one in theoutput column for that rule. This set of rules will return an output onefor each and every presence of the string “testing” in any longerstring, without returning an output one in any false circumstances.

FIG. 3 shows in more detail the logical working of a rule engine 16 (thecomponent 16 a of FIG. 1 “basic pattern matching” can contain multiplerule engines). The rule engine has three principal functionalcomponents, being a transition rule memory 22 which stores the rulessuch as those in the table above, a rule selector 24 which determineswhich rule applies, and a state register 26 which keeps track of thecurrent state of the rule engine 16. According to the output componentof the rules, an output 28 is generated. For example, if a portion ofthe data stream 12 is “testesting” (which contains only a single matchwith the pattern “testing”, then the rule engine 16 operating accordingto the rules of the table above will work as follows:

Starting state S0 (the rule engine 16 will always default to thisstate),

first letter “t” rule 2 applies and moves to state S1 (rule 2 has ahigher priority than rule 1 and so takes precedence, rule 5 does notapply as the current state is not S3),

second letter “e” rule 3 applies and moves to S2,

third letter “s” rule 4 applies and moves to S3,

fourth letter “t” rule 5 applies and moves to S4,

fifth letter “e” rule 9 applies and moves to S2,

sixth letter “s” rule 4 applies and moves to S3,

seventh letter “t” rule 5 applies and moves to S4,

eighth letter “i” rule 6 applies and moves to S5,

ninth letter “n” rule 7 applies and moves to S6,

tenth letter “g” rule 8 applies and moves to S0, but returns an outputof 1, indicating that the pattern “testing” has been detected in thedata stream 12 being passed through the rule engine 16.

FIG. 4 shows a generalised form for a state transition rule 30, with thecomponents of current state 32, input character 34, conditions 36, nextstate 38 and output component 40. The priority components of the statetransition rules are reflected in the way that the rules are stored inthe transition rule memory 22 (FIG. 3). For all basic patterns, atransition rule in this format is sufficient, and all such patterns canbe reduced to a series of such state transition rules 30. The outputcomponent 40 can, as shown above, return a simple value, such as anumeral 1, or may return other values that are then processed by theresults processor 18.

In the simple example of FIG. 2, a single pattern “testing” is detectedby the rule engine 16. Since, in most practical applications, over athousand patterns will be being monitored by the pattern matching device14, each rule engine 16 will be monitoring for multiple patterns,perhaps in the range 50-2000. As the number of patterns being monitoredby a rule engine increases, then the state diagram representing thedetection process will become more complicated, and as a corollary, thenumber of state transition rules to encode the diagram will increase.

To illustrate this concept, FIG. 5 shows a state transition diagram fora rule engine that will detect both the patterns “testing” and“testcase”. For ease of understanding, this diagram has been simplifiedby the omission of the returns to S0 encoded by rule R1. The rules thatencode this state diagram are as follows: current new rule state input−> state output priority R1 * * −> S0 — 0 R2 * t −> S1 — 1 R3 S1 e −> S2— 2 R4 S2 s −> S3 — 2 R5 S3 t −> S4 — 2 R6 S4 i −> S5 — 2 R7 S5 n −> S6— 2 R8 S6 g −> S0 1 2 R9 S4 c −> S7 — 2 R10 S7 a −> S8 — 2 R11 S8 s −>S9 — 2 R12 S9 e −> S0 2 2 R13 S4 e −> S2 — 2

These rules encode the pattern detection of the two patterns “testing”and “testcase”, with an output 1 being returned if the former isdetected, and an output 2 being returned if the latter is detected. Itwill be appreciated that as further patterns are to be matched by therule engine, then further rules are used to encode each and all of thepatterns.

Some patterns have components within them that are case sensitive. Thiscan be supported in two ways, which can be combined. Firstly,case-sensitivity specified at the pattern level can be resolved byallocating selected rule engines to perform case-sensitive matching,with the remaining rule engines performing case-insensitive matching.Secondly case sensitivity specified at the character level can be dealtwith by each rule engine performing both case-sensitive andcase-insensitive matching.

An example of case sensitivity at character level would be the pattern:[aA]B[cC], which matches: “aBc”, “ABc”, “aBC”, “ABC”. This can bedetected in the rule engine by using the condition component 36 of astate transition rule 30 to specify that a particular rule only operateswhen the specific case sensitive input character is received. The ruleselector component 24 (FIG. 3) will select a matching rule, by takingthe case-sensitive/insensitive condition flag into account.

Each transition rule is stored as a transition rule vector in thetransition rule memory 22. The rule selector 24 searches thehighest-priority transition rule matching the current state and inputcharacter in one cycle.

The set of state transition rules are stored as an efficient datastructure, with the processing logic of the rule selector. One way ofachieving this is to compile the memory and logic into a B-FSM engine,which is based on a special hash-function for efficiently searching thestate transitions rules. This technology is described in, for example,J. van Lunteren, A. P. J. Engbersen, J. Bostian, B. Carey, and C.Larsson, “XML accelerator engine,” First International Workshop on HighPerformance XML Processing, in conjunction with the 13th InternationalWorld Wide Web Conference (WWW2004), New York, N.Y., USA, May 2004.

A key feature of the B-FSM engine is that it has an approximately linearrelation between the number of transitions and the memory size, incontrast to prior-art programmable state machines that typically have anexponential relation between state and input vector widths and memorysize. As a result, the B-FSM engine can support a larger number ofstates and wider input and output vectors, being less limited by memorysize. Several optimizations, including state encoding and partitioningof the state transition diagram into multiple state clusters that areeach stored in separate hash-tables, allow the B-FSM engine to supportlarger state diagrams (e.g., 10K-100K states).

The B-FSM provides a higher performance, with a maximum rate of onestate transition per cycle (for frequencies into the GHz range forstate-of-the-art ASIC technology). Because the data structure iscontained in conventional memory (e.g., SRAM), the B-FSM engine supportsdynamic updates of the state transition diagram involving incrementaladdition and removal of states and transitions, which are realized byincremental modification of data structure in the transition rulememory. Multiple state diagrams can be supported simultaneously and areselected through the start addresses of corresponding data structures inmemory.

FIG. 6 shows one example of the design of the apparatus 10, withmultiple rule engines 16 placed in parallel. Each rule engine receivesthe data stream 12 as an input and passes an output to the resultsprocessor 18. This is the simplest embodiment, with each rule engine 16carrying out independent pattern matching on a discrete number ofpatterns, with each engine 16 working on patterns not covered by theother engines 16.

However, the rule engines 16 can be arranged in pairs of rule engines16, with each pair of rule engines 16 processing alternate portions ofthe data stream 12. One member of the pair could work on the even bytesof the data stream 12, with the other member of the pair of engines 16working on the odd bytes. The results processor 18 is therefore arrangedto combine the outputs of each pair of rule engines 16. By working onalternate bytes, the processing of the data stream 12 is speeded up,with a consequent increase in the complexity of the engines 16 carryingout the pattern matching. The average processing rate can also beincreased through an encoding of the input stream (based uponstatistical information on that stream). Other arrangements for the ruleengines 16 are possible, including having the engines 16 working inseries, with different aspects of a pattern match being carried out bydifferent rule engines. This is particularly advantageous when detectingmore complicated patterns.

The results processor 18 can provide support for rules involvingmultiple patterns, such as checking the occurrences, order andoffsets/distances of multiple patterns. The output of the (multiple)rule engines comprises the pattern identifiers that have been detectedin the input stream combined with the offsets at which these have beendetected. The result processor component will then be able (based on adata structure stored in a local memory—not shown) to check rulesspecifying additional conditions regarding the location where patternsshould be detected (e.g., exact location, certain portion, or in theentire input stream), as well as conditions regarding the order in whichmultiple patterns should be detected and the distance between them(i.e., between the offsets).

In order to test for the presence of regular expressions within a datastream, more complicated state transition rules and a more complicatedrule engine are used. A feature of the advanced state transition rule isthe ability to specify a character class in place of the normal specificinput of a character. For example in rule R2 above the input is theletter “t”. However, the advanced rule engine, shown in FIG. 7 includesa character classifier 42, which will characterise the byte in the inputstream 12, and the state transition rule used by the rule selector 24may test the character class rather than the actual input character.Examples of sets of character classes include,

\d numeric (“digit”)

\D not numeric

\w alphanumeric

\W not alphanumeric

\s whitespace (space, carriage return, tab, new line, form feed)

\S not whitespace.

These operators can be specified in the state transition rules insteadof the wildcard of rule R1 or the specified inputs of the other rules.

The rule engine 16 of FIG. 7 also includes the functional component of acounter array 44. The counter array is used in specific situations wherea particular pattern that is being detected includes an expression alongthe lines of “no \s (white space) in the next 100 characters”. To detectthis type of expression, the advanced state transition rule 22 of FIG. 8includes a counter control component 46 which can specify the length ofany count and the conditions attached to the count. Once a rule isselected by the rule selector that includes an active counter controlcomponent 46, then the counter array 44 in the rule engine 16 willexecute the counting function and control the appropriate output of therule engine 16 in respect of whether there is a match against theparticular pattern.

FIG. 9 illustrates a state transition diagram for a set of rules thatencode a regular expression. The regular expression that is representedby this diagram is:

“\sCREATE\s*\{”

This would detect such strings in the inputted data stream 12 as:

“CREATE {”, and

“<newline>create <tab> {”

For ease of understanding, this diagram has been simplified by theomission of the returns to S0 encoded by rule R1, as with FIG. 5. Therules that encode this state diagram are as follows: current new rulestate input -> state output priority R1 * * -> S0 — 0 R2 * \s -> S1 — 1R3 S1 c -> S2 — 2 R4 S2 r -> S3 — 2 R5 S3 e -> S4 — 2 R6 S4 a -> S5 — 2R7 S5 t -> S6 — 2 R8 S6 e -> S7 — 2 R9 S7 \s -> S8 — 2 R10 S7 { -> S0 12 R11 S8 \s -> S8 — 2 R12 S8 { -> S0 1 2 R13 S8 c -> S2 — 2

This state transition diagram and the rules above will detect thosepatterns that match the regular expression above including the type ofstrings listed above and will return an output of a 1 when a patternmatch is detected.

The apparatus 10 also includes a pattern distribution device (whichcould form part of the control 20), although more usually, the patterndistribution is executed by a software component. This device isarranged to receive the patterns that are to be detected by theapparatus 10. The pattern distribution device is arranged to distributethe patterns across a plurality of pattern collections, and to converteach pattern collection into a plurality of state transition rules. Thistwo part process is executed under the action of two separatealgorithms, a first splitting the patterns evenly across the pluralityof pattern collections, and the second then converting each collectioninto a series of state transition rules. The pattern distribution deviceis arranged, when distributing the patterns across the plurality ofpattern collections, to distribute the patterns according to commonalityand conflict between patterns.

A summary of the pattern compiler operation is:

-   -   Step 1: The pattern distribution algorithm distributes the        patterns over N pattern collections, where N corresponds with        the number of B-FSM engines.    -   Step 2: The transition rule generator algorithm converts each        pattern collection into an enhanced state transition diagram        comprised of state transition rules involving wildcards and        priorities (including resolution of intra/inter-pattern        conflicts, case-sensitivity).    -   Step 3: The B-FSM compiler algorithm converts each of the N        enhanced state transition diagrams into a storage-efficient        B-FSM data structure for each of the N B-FSM engines (including        state clustering, state encoding, BaRT compression).    -   All steps support incremental updates.

FIG. 10 details an example of an algorithm to separate the list ofpatterns into separate pattern collections. The distribution of patternsis, in general terms, over N pattern collections.

The description of the pattern distribution algorithm contains the terms“common prefix” and “pattern conflict” which are defined as follows:Common prefix: two patterns are said to have a common prefix of length kif the first k characters of both patterns are identical. Example: thepatterns “testing” and “testcase” have a common prefix “test” with alength of 4 characters. Pattern conflict: a conflict exists between twopatterns if a substring of, one pattern not including its firstcharacter, is a prefix of the other pattern. Example: two conflictsexist between the patterns “testing” and “pattern”: (1) thesingle-character string comprised of the third character of “pattern”,namely “t”, is a prefix of “testing”, and (2) the string formed by thefourth and fifth character of “pattern”, namely “te”, is also a prefixof “testing”.

The objective of the pattern distribution algorithm is to distribute thepatterns over N pattern collections such that (1) the number ofcharacters that are part of a common prefix in each collection isincreased, in the ideal case even maximized, (2) the number of conflictsthat exist between the patterns in each collection is reduced, in theideal case even minimized, and (3) the patterns are distributed over thevarious collections such that the accumulated pattern length is similarfor all collections (an even distribution of the pattern characters overthe collections).

Because these three conditions are not related and might even beconflicting, weights can be assigned to each of them, in order toexpress their relative importance. In many pattern-matchingapplications, patterns will be added in a given order. For example, inan intrusion detected application, patterns are added in the order inwhich new viruses or worms have been identified and appropriate ruleshave been created for detecting those.

Based on this property, a first embodiment of the pattern distributionalgorithm comprises the following steps, which are repeated for eachpattern in the order in which it is added to one of the N patterncollections that shall be detected by the N rule engines:

-   -   Step 1: Determine the longest common prefix between the new        pattern to be added, with any pattern in each of the N pattern        collections, containing the patterns that already have been        distributed. The longest common prefix for collection i will be        represented by p_(i) with 1≦i≦N.    -   Step 2: Determine the total number of conflicts between the new        pattern to be added, with any pattern in each of the N pattern        collections, thereby counting conflicts with portions of        patterns that are part of a common prefix only once. The total        number of conflicts with patterns in collection i will be        represented by c_(i) with 1≦i≦N. The total number of characters        (i.e., the accumulated pattern length) in each collection will        be represented by m_(i) with 1≦i≦N.    -   Step 3: Determine for each pattern collection a weighted sum of        the three parameters using three weights w₁, w₂ and W₃, in the        following way:        S _(i)=(w ₁ *p _(i))−(w ₂ *c _(i))−(w ₃ *m _(i))    -   Step 4: Add the pattern to collection i that has the largest        value S_(i).

The above four steps are shown in the flow chart of FIG. 10. This isonly one example of an algorithm to spread the patterns across thepattern collections. Other possibilities for changing this processinclude:

-   -   (1) The use of fewer or more parameters to determine which        collection a pattern would be added to.    -   (2) Using functions instead of fixed values as weights. For        example, weight w₃ in the above four steps, could be a function        that depends on both the number of patterns/characters in a        given collection in combination with a certain upper limit        (e.g., based on the actual memory included in the rule engine        that will be used to store this pattern collection). This would        allow the realisation of an algorithm in which the fill rate is        only taken into account if the number of patterns is approaching        the upper limit (the actual size of the memory) by increasing        the weight in that situation.    -   (3) Different orders of insertion, for example, by first sorting        the patterns by length.    -   (4) Implementations of search structures that allow for the        efficient determination of the longest common prefix and pattern        conflicts, for example, tree structures and hash table        structures for determining longest matching prefixes.

A second embodiment of the pattern distribution algorithm would comprisea “brute force” approach, in which for each new pattern and patterncollection the actual memory requirements are determined, by applyingthe transition rule generation algorithm and the B-FSM algorithm, andthen selecting the collection for which the actual memory requirementsare lower, or even minimal and are within the limit of the actualstorage capacity of the memory that is part of the corresponding ruleengine. While this approach will achieve reduced memory requirements forthe given order of inserting patterns, it will take more time forselecting the pattern collection to which a new pattern is added, andconsequently result in a slower update performance, compared to thefirst embodiment described before.

Once all patterns have been added to the pattern collections, then eachpattern collection is converted into a series of state transitiondiagrams. Two approaches for generating the state transition rules aredescribed below. The first approach is shown in the flowchart of FIG.11.

Approach 1: Convert patterns into a list of states, and generatetransition rules based on the pattern prefixes that are associated witheach state.

Create list of states: steps 1-2

-   -   Step 1: Convert each pattern comprised of N characters into a        list of N states such that with each state a different prefix of        the pattern is associated, having a size equal to 1, 2, . . . N        respectively. Note: the last state will be associated with the        original pattern (prefix size=pattern size (N)).    -   Step 2: Remove duplicate states, i.e., states that are        associated with exactly the same pattern prefixes.    -   Generate transition rules: steps 3-5    -   Step 3: Create a default transition rule to state S0, involving        a wildcard condition for both the current state and input, and        having a priority 0.    -   Step 4: Search the list of states for states that are associated        with a prefix comprised of a single character. Create a        transition rule to each of these states, involving a wildcard        for the current state, the single character prefix as input        value, and having a priority 1.    -   Step 5: Search the list of states for pairs of states        (S_(i),S_(j)) that have the property that the prefix associated        with state S_(i), or the last part of that prefix, equals the        prefix that is associated with state S_(j) after removal of its        last character. Create for each of these pairs a transition rule        from state S_(i) to state S_(j) involving the last character of        the prefix associated with state S_(j) as input value, and        having a priority 2.

The described approach will now be illustrated using an exampleinvolving the detection of all occurrences of three patterns “testing”,“testcase” and “pattern” that can occur anywhere in the input stream.

-   -   Step 1: create list of states with associated prefixes. This        results in the following states and prefixes for the three        patterns:    -   state=S1 pattern=“t”    -   state=S2 pattern=“te”    -   state=S3 pattern=“tes”    -   state=S4 pattern=“test”    -   state=S5 pattern=“testi”    -   state=S6 pattern=“testin”    -   state=S7 pattern=“testing”    -   state=S8 pattern=“t”    -   state=S9 pattern=“te”    -   state=S10 pattern=“tes”    -   state=S11 pattern=“test”    -   state=S12 pattern=“testc”    -   state=S13 pattern=“testca”    -   state=S14 pattern=“testcas”    -   state=S15 pattern=“testcase”    -   state=S16 pattern=“p”    -   state=S17 pattern=“pa”    -   state=S18 pattern=“pat”    -   state=S19 pattern=“patt”    -   state=S20 pattern=“patte”    -   state=S21 pattern=“patter”    -   state=S22 pattern=“pattern”    -   Step 2: Remove duplicate states.    -   States S8, S9, S10 and S11 are removed because these are equal        to states S1, S2, S3 and S4 respectively. The state list after        step 2 now reads:    -   state=S1 pattern=“t”    -   state=S2 pattern=“te”    -   state=S3 pattern=“tes”    -   state=S4 pattern=“test”    -   state=S5 pattern=“testi”    -   state=S6 pattern=“testin”    -   state=S7 pattern=“testing”    -   state=S12 pattern=“testc”    -   state=S13 pattern=“testca”    -   state=S14 pattern=“testcas”    -   state=S15 pattern=“testcase”    -   state=S16 pattern=“p”    -   state=S17 pattern=“pa”    -   state=S18 pattern=“pat”    -   state=S19 pattern=“patt”    -   state=S20 pattern=“patte”    -   state=S21 pattern=“patter”    -   state=S22 pattern=“pattern”    -   Step 3: Create default rule.

Transition rule list after step 3: current new rule state input −> stateoutput Priority R1 * * −> S0 — 0

Step 4: Search for states that are associated with a single-characterprefix and create a transition rule to each of these states, with awildcard current state and priority 1. There exist two states with asingle-character prefix: S1 and S16. After creating a transition rulefor each of these states, the transition rule list equals: current newrule state input −> state output Priority R1 * * −> S0 — 0 R2 * t −> S1— 1 R3 * p −> S16 — 1

Step 5: Search for pairs of states (S_(i),S_(j)) that have the propertythat the prefix associated with state S_(i), or the last part of thatprefix, equals the prefix that is associated with state S_(j) afterremoval of its last character. Create for each of these pairs atransition rule.

State S1 and state S2 form a pair of states with the property: theprefix associated with state S1 (“t”) equals the prefix associated withstate S2 after removal of its last character (“t”). As a result atransition rule will be created from state S1 to S2 involving the lastcharacter of the prefix associated with state S2 (“e”) as input andhaving a priority 2: R S1 e −> S2 — 2

State S20 and state S3 also form a pair with the above property: thelast part of the prefix associated with state S20 (“patte”) equals theprefix associated with state S3 after removal of its last character(“te”). As a result a transition rule will be created from state S20 toS3 involving the last character of the prefix associated with state S3(“s”) as input and having a priority 2: R S20 s −> S3 — 2

After all pairs of states with the above property have been found andcorresponding transition rules have been created, the transition rulelist equals: current new rule state input −> state output PriorityR1 * * −> S0 — 0 R2 * t −> S1 — 1 R3 * p −> S16 — 1 R4 S1 e −> S2 — 2 R5S2 s −> S3 — 2 R6 S3 t −> S4 — 2 R7 S4 i −> S5 — 2 R8 S5 n −> S6 — 2 R9S6 g −> S7 — 2 R10 S4 c −> S12 — 2 R11 S12 a −> S13 — 2 R12 S13 s −> S14— 2 R13 S14 e −> S15 — 2 R14 S16 a −> S17 — 2 R15 S17 t −> S18 — 2 R16S18 t −> S19 — 2 R17 S19 e −> S20 — 2 R18 S20 r −> S21 — 2 R19 S21 n −>S22 — 2 R20 S4 e −> S2 — 2 R21 S18 e −> S2 — 2 R22 S20 s −> S3 — 2

After the state transition rules have been generated as described above,output components are assigned to the state transition rules thatcorrespond to the last characters of the patterns that have beenconverted. In the above example, state transition rules 9, 13 and 19will be assigned output components corresponding to the respectivepatterns “testing”, “testcase”, and “pattern”.

The second approach is to convert patterns into transition rules andresolve collisions by direct processing of transition rules.

Step 1: A default transition rule is created to state S0, involving awildcard condition for both the current state and input, and having apriority 0.

Steps 2-3 are applied to each pattern:

Step 2: Parse each next pattern that will be converted using thetransition rules that have already been created (in other words: use thenext pattern as “input stream”, and process it using the existingtransition rules) until a transition to state S0 is made (defaulttransition rule). In this way, the longest common prefix with any otherpattern is determined that has already been converted: transition rulesexist already for the characters comprising this common prefix.

Step 3: Next: create a transition rule for each character in the patternthat is not part of the common prefix as determined in step 2, with anew unique next state.

The transition rule corresponding to the first character of the pattern(if there was no common prefix), will contain a wildcard for the currentstate and have a priority 1. The transition rules for the othercharacters will have a current state that is equal to next state of thetransition rule corresponding to the previous character in the patternand have a priority 2.

Step 4: Pattern collisions are resolved in the following way. For each“priority 1” transition rule (wildcard current state), it is checked if“priority 2” transition rules exist (non-wild card current state) thatinvolve the same input value. It is now assumed that a given “priority1” transition rule involves a transition to a next state n1 and that a“priority 2” transition rule is found with the same input characterinvolving a transition to a next state n2. Now for all transition rulesthat exist with a current state equal to n1, a new copy of these ruleswill be created involving a current state n2. If there already exists atransition rule that involves the same current state (n2) and inputvalue as one of the new copied transition rules, then the same operationis iterated on the next states of these two “colliding” rules, while thecopied rule will be removed. This operation is repeated until nocollisions are found anymore.

The described approach will how be illustrated using an exampleinvolving the detection of all occurrences of three patterns “testing”,“testcase” and “pattern” that can occur anywhere in the input stream.

Step 1: create default rule.

Transition rule list after step 1: current new rule state input −> stateoutput priority R1 * * −> S0 — 0

Convert pattern “testing”:

Step 2′: Parse “testing” using existing rules.

Only one default rule exists to state S0, therefore no common prefix.

Step 3: Create transition rule for each character that is not part ofcommon prefix: rules 1-8. Transition rule list after converting“testing”: current new rule state input −> state output priority R1 * *−> S0 — 0 R2 * t −> S1 — 1 R3 S1 e −> S2 — 2 R4 S2 s −> S3 — 2 R5 S3 t−> S4 — 2 R6 S4 i −> S5 — 2 R7 S5 n −> S6 — 2 R8 S6 g −> S7 — 2

Convert pattern “testcase”:

Step 2: Parse “testcase” using existing rules.

The first four characters of “testcase” are parsed by the transitionrules 2, 3, 4 and 5 (in this order), while the fifth character wouldcause a transition to state S0.

Consequently a common prefix “test” exists.

Step 3: Create transition rule for each character that is not part ofcommon prefix: rules 9-12. Transition rule list after converting theremaining portion of the pattern, namely “case”: current new rule stateinput −> state output priority R1 * * −> S0 — 0 R2 * t −> S1 — 1 R3 S1 e−> S2 — 2 R4 S2 s −> S3 — 2 R5 S3 t −> S4 — 2 R6 S4 i −> S5 — 2 R7 S5 n−> S6 — 2 R8 S6 g −> S7 — 2 R9 S4 c −> S8 — 2 R10 S8 a −> S9 — 2 R11 S9s −> S10 — 2 R12 S10 e −> S11 — 2

Convert pattern “pattern”:

Step 2: Parse “pattern” using existing rules.

The first character of “pattern” causes a transition to state S0.Consequently no common prefix exists.

Step 3: Create transition rule for each character that is not part ofthe common prefix: rule 13-19. Transition rule list after converting theremaining portion of the pattern, namely “case”: current new rule stateinput −> state output priority R1 * * −> S0 — 0 R2 * t −> S1 — 1 R3 S1 e−> S2 — 2 R4 S2 s −> S3 — 2 R5 S3 t −> S4 — 2 R6 S4 i −> S5 — 2 R7 S5 n−> S6 — 2 R8 S6 g −> S7 — 2 R9 S4 c −> S8 — 2 R10 S8 a −> S9 — 2 R11 S9s −> S10 — 2 R12 S10 e −> S11 — 2 R13 * p −> S12 — 2 R14 S12 a −> S13 —2 R15 S13 t −> S14 — 2 R16 S14 t −> S15 — 2 R17 S15 e −> S16 — 2 R18 S16r −> S17 — 2 R19 S17 n −> S18 — 2

Step 4: Resolve pattern collisions.

Transition rule 2 (priority 1) and transition rule 5 (priority 2)collide. Rule 2 involves a transition to state S1. Rule 5 involves atransition to state S4. There exists one transition from state S1,namely rule 3. Now a copy of rule 3 is created, with the current statereplaced by S4. R20 S4 e −> S2 — 2

Transition rule 2 (priority 1) and transition rule 15 (priority 2)collide. Rule 2 involves a transition to state S1. Rule 15 involves atransition to state S14. There exists one transition from state S1,namely rule 3. Now a copy of rule 3 is created, with the current statereplaced by S14. R21 S14 e −> S2 — 2

Transition rule 2 (priority 1) and transition rule 16 (priority 2)collide. Rule 2 involves a transition to state S1. Rule 15 involves atransition to state S15. There exists one transition from state S1,namely rule 3. Now a copy of rule 3 is created, with the current statereplaced by S15. R S15 e −> S2 — 2

However, there is already a transition rule in existence with the samecurrent state and input, namely rule 17. The copied rule involves atransition to state S2. Rule 17 involves a transition to state S16.There exists one transition from state S2, namely rule 4. Now a copy ofrule 4 is created with the current state replaced by S17. R22 S17 e −>S3 — 2

No other collisions have been found. Transition rule list afterresolving all collisions: current new rule state Input −> state outputpriority R1 * * −> S0 — 0 R2 * t −> S1 — 1 R3 S1 e −> S2 — 2 R4 S2 s −>S3 — 2 R5 S3 t −> S4 — 2 R6 S4 i −> S5 — 2 R7 S5 n −> S6 — 2 R8 S6 g −>S7 — 2 R9 S4 c −> S8 — 2 R10 S8 a −> S9 — 2 R11 S9 s −> S10 — 2 R12 S10e −> S11 — 2 R13 * p −> S12 — 2 R14 S12 a −> S13 — 2 R15 S13 t −> S14 —2 R16 S14 t −> S15 — 2 R17 S15 e −> S16 — 2 R18 S16 r −> S17 — 2 R19 S17n −> S18 — 2 R20 S4 e −> S2 — 2 R21 S14 e −> S2 — 2 R22 S17 e −> S3 — 2

Variations described for the present invention can be realized in anycombination desirable for each particular application. Thus particularlimitations, and/or embodiment enhancements described herein, which mayhave particular advantages to a particular application need not be usedfor all applications. Also, not all limitations need be implemented inmethods, systems and/or apparatus including one or more concepts of thepresent invention. Methods may be implemented as signal methodsemploying signals to implement one or more steps. Signals include thoseemanating from the Internet, etc.

The present invention can be realized in hardware, software, or acombination of hardware and software. A visualization tool according tothe present invention can be realized in a centralized fashion in onecomputer system, or in a distributed fashion where different elementsare spread across several interconnected computer systems. Any kind ofcomputer system—or other apparatus adapted for carrying out the methodsand/or functions described herein—is suitable. A typical combination ofhardware and software could be a general purpose computer system with acomputer program that, when being loaded and executed, controls thecomputer system such that it carries out the methods described herein.The present invention can also be embedded in a computer programproduct, which comprises all the features enabling the implementation ofthe methods described herein, and which—when loaded in a computersystem—is able to carry out these methods.

Computer program means or computer program in the present contextinclude any expression, in any language, code or notation, of a set ofinstructions intended to cause a system having an information processingcapability to perform a particular function either directly or afterconversion to another language, code or notation, and/or reproduction ina different material form.

Thus the invention includes an article of manufacture which comprises acomputer usable medium having computer readable program code meansembodied therein for causing a function described above. The computerreadable program code means in the article of manufacture comprisescomputer readable program code means for causing a computer to effectthe steps of a method of this invention. Similarly, the presentinvention may be implemented as a computer program product comprising acomputer usable medium having computer readable program code meansembodied therein for causing a function described above. The computerreadable program code means in the computer program product comprisingcomputer readable program code means for causing a computer to affectone or more functions of this invention. Furthermore, the presentinvention may be implemented as a program storage device readable bymachine, tangibly embodying a program of instructions executable by themachine to perform method steps for causing one or more functions ofthis invention.

It is noted that the foregoing has outlined some of the more pertinentobjects and embodiments of the present invention. This invention may beused for many applications. Thus, although the description is made forparticular arrangements and methods, the intent and concept of theinvention is suitable and applicable to other arrangements andapplications. It will be clear to those skilled in the art thatmodifications to the disclosed embodiments can be effected withoutdeparting from the spirit and scope of the invention. The describedembodiments ought to be construed to be merely illustrative of some ofthe more prominent features and applications of the invention. Otherbeneficial results can be realized by applying the disclosed inventionin a different manner or modifying the invention in ways known to thosefamiliar with the art.

1. An apparatus for detecting a pattern in a data stream comprising apattern matching device for receiving the data stream, the patternmatching device comprising at least one rule engine, said at least onerule engine operating under a plurality of state transition rulesencoding a plurality of patterns, a first state transition ruleincluding a wildcard state component and a wildcard input component, asecond state transition rule including a wildcard state component and aspecified input component, and a third state transition rule including aspecified state component and a specified input component, the first,second and third rules having differing priorities, and at least onestate transition rule including an output component indicating a patternmatch, the apparatus arranged to pass the data stream to said at leastone rule engine, and further arranged to output a signal indicating apattern match when a state transition rule indicates a pattern match. 2.An apparatus according to claim 1, further comprising a patterndistribution device arranged to receive the patterns, to distribute thepatterns across a plurality of pattern collections, and to convert eachpattern collection into a plurality of state transition rules.
 3. Anapparatus according to claim 2, wherein the pattern distribution deviceis arranged to distribute the patterns substantially evenly across theplurality of pattern collections.
 4. An apparatus according to claim 2,wherein the pattern distribution device is arranged, when distributingthe patterns across the plurality of pattern collections, to distributethe patterns according to commonality and conflict between patterns. 5.An apparatus according claim 1, further comprising a results processorfor receiving output from said at least one rule engine, the resultsprocessor arranged to determine if a pattern match has occurred.
 6. Anapparatus according claim 1, wherein at least one of the statetransition rules includes a character class component.
 7. An apparatusaccording claim 1, wherein the pattern matching device comprises aplurality of rule engines.
 8. An apparatus according to claim 7, whereinthe rule engines are arranged in at least one pair of rule engines, withsaid at least one pair of rule engines processing alternate portions ofthe data stream.
 9. An apparatus according to claim 8, furthercomprising a results processor for receiving output from said at leastone rule engine, the results processor arranged to determine if apattern match has occurred, wherein the results processor is arranged tocombine the outputs of said at least one pair of rule engines.
 10. Amethod for detecting a pattern in a data stream comprising receiving thedata stream, running at least one rule engine, said at least one ruleengine operating under a plurality of state transition rules encoding aplurality of patterns, a first state transition rule including awildcard state component and a wildcard input component, a second statetransition rule including a wildcard state component and a specifiedinput component, and a third state transition rule including a specifiedstate component and a specified input component, the first, second andthird rules having differing priorities, and at least one statetransition rule including an output component indicating a patternmatch, passing the data stream to said at least one rule engine, andoutputting a signal indicating a pattern match when a state transitionrule indicates a pattern match.
 11. A method according to claim 10,further comprising receiving the patterns, distributing the patternsacross a plurality of pattern collections, and converting each patterncollection into a plurality of state transition rules.
 12. A methodaccording to claim 11, wherein the step of distributing the patternsacross the plurality of pattern collections distributes the patternssubstantially evenly across the plurality of pattern collections.
 13. Amethod according to claim 11, wherein the step of distributing thepatterns across the plurality of pattern collections, is executed by analgorithm, which distributes the patterns according to commonality andconflict between patterns.
 14. A method according to claim 10, furthercomprising processing the output from said at least one rule engine todetermine if a pattern match has occurred.
 15. A method according toclaim 10, wherein at least one of the state transition rules includes acharacter class component.
 16. A method according to claim 10,comprising running a plurality of rule engines.
 17. A method accordingto claim 16, wherein the rule engines are arranged in at least one pairof rule engines, with said at least one pair of rule engines processingalternate portions of the data stream.
 18. A method according to claim17, further comprising processing the output from said at least one ruleengine to determine if a pattern match has occurred, wherein theprocessing of the outputs of the rule engines comprises combining theoutputs of said at least one pair of rule engines.
 19. A computerprogram product on a computer readable medium for controlling apparatusfor detecting a pattern in a data stream, the computer program productcomprising instructions for receiving the data stream, running at leastone rule engine, said at least one rule engine operating under aplurality of state transition rules encoding a plurality of patterns, afirst state transition rule including a wildcard state component and awildcard input component, a second state transition rule including awildcard state component and a specified input component, and a thirdstate transition rule including a specified state component and aspecified input component, the first, second and third rules havingdiffering priorities, and at least one state transition rule includingan output component indicating a pattern match, passing the data streamto said at least one rule engine, and outputting a signal indicating apattern match when a state transition rule indicates a pattern match.20. A computer program product according to claim 19, further comprisinginstructions for receiving the patterns, distributing the pattern acrossa plurality of pattern collections, and converting each patterncollection into a plurality of state transition rules.
 21. A computerprogram product according to claim 20, wherein the step of distributingthe patterns across the plurality of pattern collections distributes thepatterns substantially evenly across the plurality of patterncollections.
 22. A computer program product according to claim 20,wherein the step of distributing the patterns across the plurality ofpattern collections, is executed by an algorithm, which distributes thepatterns according to commonality and conflict between patterns.
 23. Acomputer program product according to claim 19, further comprisinginstructions for processing the output from said at least one ruleengine to determine if a pattern match has occurred.
 24. A computerprogram product according to claim 19, wherein at least one of the statetransition rules includes a character class component.
 25. A computerprogram product according to claim 19, comprising instructions forrunning a plurality of rule engines.
 26. A computer program productaccording to claim 25, wherein the rule engines are arranged in at leastone pair of rule engines, with said at least one pair of rule enginesprocessing alternate portions of the data stream.
 27. A computer programproduct according to claim 26, further comprising instructions forprocessing the output from said at least one rule engine to determine ifa pattern match has occurred, wherein the processing of the outputs ofthe rule engines comprises combining the outputs of said at least onepair of rule engines.